Systems and methods of capturing eye-gaze data

ABSTRACT

Systems and methods are provided for collecting eye-gaze data for training an eye-gaze prediction model. The collecting includes selecting a scan path that passes through a series of regions of a grid on a screen of a computing device, moving a symbol as an eye-gaze target along the scan path, and receiving facial images at eye-gaze points. The eye-gaze points are uniformly distributed within the respective regions. Areas of the regions that are adjacent to edges and corners of the screen are smaller than other regions. The difference in areas shifts centers of the regions toward the edges, density of data closer to the edges. The scan path passes through locations in proximity to the edges and corners of the screen for capturing more eye-gaze points in the proximity. The methods interactively enhance variations of facial images by displaying instructions to the user to make specific actions associated with the face.

BACKGROUND

Traditional eye-gaze tracking systems collect eye-gaze data to calibrateaccuracy of eye tracking. For example, some systems prompt an operatorof a computing device to use a calibration application, which causes thecomputing device to display a gaze target (e.g., a red dot) atcoordinates across the screen of the computing device. The operator isprompted to look at and follow the gaze target as the computing devicemoves the gaze target at different locations on the screen, one at atime, in an evenly distributed manner.

The traditional systems face issues of degrading accuracy with evenlydistributed eye-gaze data. The degradation occurs particularly when theeyes look at locations that are at the edges the screen. Furthermore, aless number of neighboring eye-gaze target points in the training datanear the screen boundaries cause the deep learning algorithm to predictan eye-gaze location less accurately. In practice, the less accurateprediction of eye-gaze points at edges and corners of the screen becomesignificant when users use the edges and corners in common operations.For example, some of interactive system icons and buttons (e.g., a startbutton or a close button) appear at a corner of the screen (e.g., atlower left corner of the screen for start, upper right for close).

Increased accuracy in predicting eye-gaze locations would need moretraining data at select regions of the screen than other regions.Furthermore, capturing eye-gaze data as training data needs to completeat minimal time to prevent stress upon the operator doing thenon-substantive operations (e.g., calibration of eye-gazing and notusing applications for performing a desired task) on the computingdevice. Thus, developing a technology that better meets theserequirements of capturing necessary eye-gaze training data and providingan ease of use would be desirable.

It is with respect to these and other general considerations that theaspects disclosed herein have been made. Also, although relativelyspecific problems may be discussed, it should be understood that theexamples should not be limited to solving the specific problemsidentified in the background or elsewhere in this disclosure.

SUMMARY

According to the present disclosure, the above and other issues areresolved by generating a grid on a screen of a computing device with aseries of regions with predetermined aspects in the grid and selecting ascan path that passes through one or more randomly generated gaze pointwithin respective regions without crossing own path. Centers of therespective regions represent the overall average location or expectedvalue over sample gaze points. The computing device displays an eye-gazetarget that traverses along the path while guiding attention of theoperator. The computing device captures a series of images or videoimage data as the operator follows the moving eye-gaze target, anddetermines eye-gaze locations as training data to train a gazeprediction model.

The gaze-point data indicate a uniform distribution over the series ofregions in the grid on the screen. Areas of the respective regionstoward the center of the screen are larger than areas of the regionsadjacent to the corners and the edges of the screen to capture moreeye-gaze data toward the corners and the edges of the screen whilemaintaining the uniform distribution over the series of regions in thegrid. This way, the training data includes eye-gaze data with enhanceddensity near screen boundaries. The non-overlapping scan path preventsclustering of the training data. The non-overlapping scan path furtherminimizes the amount of time needed to capture the gaze-point data. Eachgaze-point is captured once where the path covers the whole screen.

The scan path traverses across multiple screens as a combined screenwhen the computing device includes multiple displays. When the computingdevice includes multiple cameras, the present disclosure uses facialimages from the multiple cameras and determines eye-gaze locations basedon the multiple facial images for higher accuracy.

The methods further integrate an interactive mode in collecting eye-gazedata by interactively providing instructions to the user to performspecific actions (e.g., holding the face still, moving, and rotating theface, and the like).

This Summary is provided to introduce a selection of concepts in asimplified form, which is further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter. Additionalaspects, features, and/or advantages of examples will be set forth inpart in the following description and, in part, will be apparent fromthe description, or may be learned by practice of the disclosure.

BRIEF DESCRIPTIONS OF THE DRAWINGS

Non-limiting and non-exhaustive examples are described with reference tothe following figures.

FIG. 1 illustrates an overview of an example system for capturingeye-gaze data as training data in accordance with aspects of the presentdisclosure.

FIGS. 2A-D illustrate examples of a series of regions in a grid on thescreen in accordance with aspects of the present disclosure.

FIGS. 3A-B illustrate examples of a series of regions in a grid on thescreen in accordance with aspects of the present disclosure.

FIG. 4 illustrates an example of a series of regions in a grid acrossmultiple screens in accordance with aspects of the present disclosure.

FIGS. 5A-B illustrate examples of interactively capturing eye-gaze datain accordance with aspects of the present disclosure.

FIG. 6 illustrates an example of a method for capturing eye-gaze datafor training an eye-gaze prediction model in accordance with aspects ofthe present disclosure.

FIG. 7 is a block diagram illustrating example physical components of acomputing device with which aspects of the disclosure may be practiced.

FIG. 8A is a simplified diagram of a mobile computing device with whichaspects of the present disclosure may be practiced.

FIG. 8B is another simplified block diagram of a mobile computing devicewith which aspects of the present disclosure may be practiced.

DETAILED DESCRIPTION

Various aspects of the disclosure are described more fully below withreference to the accompanying drawings, which from a part hereof, andwhich show specific example aspects. However, different aspects of thedisclosure may be implemented in many different ways and should not beconstrued as limited to the aspects set forth herein; rather, theseaspects are provided so that this disclosure will be thorough andcomplete and will fully convey the scope of the aspects to those skilledin the art. Aspects may be practiced as methods, systems, or devices.Accordingly, aspects may take the form of a hardware implementation, anentirely software implementation or an implementation combining softwareand hardware aspects. The following detailed description is, therefore,not to be taken in a limiting sense.

Predicting eye-gaze locations using an eye-gaze prediction model withaccuracy depends on training data that include quality of camera(s)(e.g., both eyes, face, head-pose in space relative to the camera),illumination, background noise, and richness of training dataset (e.g.,size and diversity).

As discussed in more detail below, the present disclosure relates tocapturing eye-gaze data as training data for training an eye-gazeprediction model. In particular, the present technology includesselecting a scan path along which an eye-gaze target moves whilecapturing a video of an operator's facial image for determining an eyegaze location.

The present disclosure addresses the problem of capturing training datathat improves accuracy in predicting eye-gaze locations. The disclosedtechnology enables capturing data for training the machine learningmodels, particularly based on a difference in terms. In aspects,generating training data may include data augmentation, which generatestraining data based on a small set of captured data. The disclosedtechnology captures eye-gaze data using an eye-gaze target that movesalong a scan path. The scan path passes through a series of regions in agrid for capturing facial images at eye-gaze points. The eye-gaze pointsare uniformly and randomly distributed in respective regions. Dimensionsof the regions may be adjusted to raise density of the eye-gaze points.In aspects, areas of the regions along edges and corners of the screenmay be smaller than the edges and corners toward center of the screen tocapture more eye-gaze data in toward the edges and corners. The screenmay be an aggregate screen that includes multiple screens. Facial imagesfrom multiple cameras with different wavelength sensitivity (e.g.,visible spectrum or infrared) may be captured for generating trainingdata. The disclosed technology may enhance variations of facial imagesby interactively instructing the operator to make specific actionsassociated with the face and receives additional facial images based onthe actions by the operator. The disclosed technology may also track thequality and diversity of the acquired data against the desiredbenchmarks and present the operator with the appropriate capture modesaccordingly.

FIG. 1 illustrates an overview of an example system 100 for capturingeye-gaze data for training an eye-gaze prediction model in accordancewith aspects of the present disclosure. System 100 represents a systemusing a grid with a series of regions and a scan path that passesthrough centers of respective regions of the series of regions fordisplaying an eye-gaze target. System 100 includes an image input devicecontroller 102, a display controller 104, a pointer output controller106, an eye-gaze data collector 110, and a data bus 140 that connectsthe eye-gaze data collector 110 with respective controllers.

The image input device controller 102 controls one or more image inputdevices (e.g., an RGB/visible light camera, IR camera, etc.). Inaspects, the RGB camera captures image frames for facial images and/orvideo stream of a face of an operator operating the computing device. Insome aspects, the image input device controller 102 controls andconcurrently captures image frames from multiple image input devices. Inaspects, the disclosed technology may include existing and or futurecamera technology including pulsed emitters (e.g., light emittingdiodes) which may be used to enhance accuracy and precision of resultsand mitigate for low environment lighting or high light interference(e.g., sunlight) and/or dynamic shutters or filters.

The display controller 104 controls a screen of the computing device.The display controller 104 may display an eye-gaze indicator (e.g., apointer output) that indicates where the operator is looking at. Thedisplay controller 104 may also display an eye-gaze target (e.g., a reddot icon) to prompt and guide the operator to look at a specificlocation of the screen.

The pointer output controller 106 controls a location and a shape of thepointer output on the screen of the computing device. In aspects, thepointer output may represent where the operator is looking at on thescreen of the computing device. In some other aspects, the pointeroutput may represent a cursor of a mouse and other pointing devices.

The eye-gaze data collector 110 includes a scan path selector 112, ascan path store 114, an eye-gaze target generator 116, a facial imagereceiver 118, an eye-gaze training data generator 120, eye-gaze trainingdata database 122. The eye-gaze data collector 110 further includes aneye-gaze prediction model trainer 124, a trained eye-gaze predictionmodel 128, and an eye-gaze prediction model transmitter 126.

The scan path selector 112 selects a scan path from the scan path store114. The scan path store 114 stores a set of predefined scan paths. Inaspects, a scan path represents a path that passes through uniformly andrandomly generated gaze targets in the respective regions of the seriesof regions of the grid on the screen of the computing device. A centerof a region represents an expected value of uniformly distributed randompoints in the region. By capturing eye-gaze data around the centers ofthe regions, the system 100 generates training data with a uniformdistribution on the screen of the computing device. Some scan paths passthrough corners and along edges of the screen in a longer duration thanother scan paths. In aspects, the scan path selector 112 may select aparticular scan path at random. Providing variations in moving theeye-gaze target at different times may keep the operator's attention.Additionally or alternatively, the scan paths have none or a minimalabrupt change in the directions of gaze target movement. The reduced oreliminated abrupt change makes it easier for the operator to follow andmimic a more real-world setting of user interface. The scan pathselector 112 may select a scan path passes through near corners andalong edges of the display when the distribution of the capturedeye-gaze locations indicate a scarcity of the captured eye-gazelocations near the corners and along the edges.

In aspects, the disclosed technology generates a grid with a series ofregions based on an aspect ratio of the screen of the computing device.A dimension of the grid may be, for example, four by three when anaspect ratio of the screen is 16:9, generating a total of twelveregions. The series of regions may be of equal size. Additionally, oralternatively, regions along the edges and the corners of the screen maybe smaller in area than other regions in the grid.

The eye-gaze target generator 116 generates an eye-gaze target thatmoves along the selected scan path. In aspects, the eye-gaze targetmoves along the scan path at a constant velocity. In some other aspects,the eye-gaze target moves in varying speeds, slower as the eye-gazetarget is in proximity of corners of the screen and making sharp turns.In aspects, the eye-gaze target may be a symbol (e.g., an icon) thatkeeps attention by the operator. The eye-gaze target generator 116further indicates the eye-gaze target on a screen of the computingdevice. In aspects, the eye-gaze target generator 116 transmitsinstructions the display controller 104 to display the eye-gaze targeton the screen of the computing device. In some aspects, the system 100displays the eye-gaze target without displaying the scan path. By notdisplaying the eye-gaze target without the scan path, the system 100enables the operator to focus on following the eye-gaze target as theeye-gaze target moves across the screen.

The facial image receiver 118 receives a facial image from the imageinput device controller 102. The facial image includes a face of theoperator following the eye-gaze target on the screen of the computingdevice. In aspects, a camera associated with the computing devicecaptures the facial image. In aspects, the facial image receiver 118continuously receives the facial image as frames in a video data streamwhile the eye-gaze target moves along the scan path. In contract, thetraditional system may capture facial images when the system displaysthe eye-gaze target at randomly generated coordinates. The continuousreceiving of the facial images as a video stream enables capturingvariations and additional data points for in training data.

The eye-gaze training data generator 120 generates eye-gaze trainingdata based on a location information of the eye-gaze target and thereceived facial image data in which the operator is looking at thesymbol. The eye-gaze training data generator 120 stores the trainingdata in the eye-gaze training data database 122.

The eye-gaze prediction model trainer 124 trains a gaze prediction modelusing the eye-gaze training data database 122. In aspects, the eye-gazeprediction model trainer 124 updates parameters of a convolutionalneural network and a series of fully connected neural networks based onthe eye-gaze training data database 122. The eye-gaze prediction modeltrainer 124 stores the parameters for the neural networks in the trainedeye-gaze prediction model 128. In aspects, the eye-gaze training datadatabase 122 may depend on the operator.

The eye-gaze prediction model transmitter 126 transmits the trainedeye-gaze prediction model 128 to an eye-gaze tracker (not shown) fordeployment.

As will be appreciated, the various methods, devices, applications,features, etc., described with respect to FIG. 1 are not intended tolimit the system 100 to being performed by the particular applicationsand features described. Accordingly, additional controllerconfigurations may be used to practice the methods and systems hereinand/or features and applications described may be excluded withoutdeparting from the methods and systems disclosed herein.

FIGS. 2A-B illustrate examples of a series of regions of a grid inaccordance with aspects with aspects of the present disclosure. FIG. 2Aillustrates an example of a grid 202 with a series of regions (regions206A-M) on a screen of a computing device. In aspects, the regions inthe grid may be equal in areas with the same aspect ratio as the screen.For example, the screen may be 16:9 in the aspect ratio, and the gridmay be 4 by 3. Accordingly, each of the regions in the grid may have thesame aspect ratio of 4:3. A center 204 (or a primary target point) ofthe region 206A is a location that represent an expected value ofuniformly distributed points in the region 206A.

FIG. 2B illustrates an example of a grid 202 with a scan path 250 wherethe eye-gaze target moves along the scan path while capturing eye-gazedata for training the eye-gaze prediction model. The scan path 250starts at the center of the region 206A and passes through centers of aseries of regions: starting at region 206A, followed by region 206B,region 206E, region 206J, region 206F, region 206C, region 206D, region206G, region 206K, region 206L, region 206H, and region 206M. The scanpath 250 ends at the center of the region 206M. In some aspects, thescan path 250 passes through the uniformly and randomly generated gazetarget in the particular the regions only once. In aspects, the centerof each region shows the average location of gaze point within thatregion. In practice, the gaze points are generated randomly within eachregion for training to achieve greater diversity in the dataset. Duringcalibration these centers could be directly used as is

The traditional system displays a symbol at the target coordinates onthe screen and captures a facial image as the operator looks at thesymbol. In contrast, the disclosed technology here moves the symbolalong the path, and captures images while the operator follows themoving symbol. The captured eye-gaze data in the disclosed technologystill maintain randomness of data points because the scan path passesthrough uniformly and randomly generated gaze targets of all the regionsacross the screen. The disclosed technology captures variations ofeye-gaze data and additional data points in training data.

In aspects, the scan path 250 is generally without crossing (e.g.,intersecting) its own path. Capturing eye-gaze data along thenon-overlapping path prevents the eye-gaze data from forming clusters.The clusters may form when the training data includes multiple facialimages from the same eye-gaze location. Each point on thenon-overlapping path is unique to all other points on the path.

FIG. 2C illustrates another example of a scan path in the grid on thescreen in accordance with aspects of the present disclosure. The grid202 includes twelve regions (e.g., regions 206A-M). In aspects, a scanpath 212 starts at the center of region 206J, followed by region 206E,region 206A, region 206K, region 206F, region 206B, region 206L, region206G, region 206C, region 206M, region 206H, and region 206D.Alternative scan paths reduces likelihood of the data collectionbecoming mundane tasks for the operators to follow the eye-gaze targetmoving in a same pattern. Furthermore, it brings enhanced spatialcoverage in the acquired training data set.

FIG. 2D illustrates yet another example of a scan path in the grid onthe screen in accordance with aspects of the present disclosure. Thegrid 202 includes twelve regions (e.g., regions 206A-M). In aspects, ascan path 214 starts at the center of region 206J, followed by region206K, region 206E, region 206A, region 206F, region 206L, region 206M,region 206G, region 206B, region 206C, region 206H, and region 206D. Inaspects, the present disclosure maintains multiple scan paths in adatabase (e.g., the scan path store 114 in FIG. 1).

FIG. 3A illustrates an example of a set of scan paths in a grid on thescreen in accordance with aspects of the present disclosure. Sometraditional systems often suffer from having low-density data atboundaries of the screen when collecting training data. In practice, asthe operator's visual focus moves away from the camera on the screen, insome systems there may be an intrinsic decay in resolution, whichnegatively influences accuracy of determining an eye-gaze location.Having low-density data at boundaries of the screen worsens the problembecause of a lack of data points in areas where the intrinsic resolutiondecay occurs. The present disclosure forces more data acquisition towarddevice boundaries (e.g., edges and corners) by reducing areas of someregions than others and repositioning centers of respective regions onthe screen. In particular, the disclosed technology generates a gridsuch that those regions that are adjacent to edges and corners of thescreen are smaller in areas than the regions that are not adjacent tothe edges and the corners.

In aspects, the screen has an aspect ratio of 16:9. A grid may includetwelve regions with respective horizontal edges with lengths 3:5:5:3 andrespective vertical edges with lengths 2:5:2. In FIG. 3A, a grid 302includes twelve regions. Region 306A and region 306D have an aspectratio of 2:3, region 306B and region 306C having the aspect ratio of2:5, region 306E and region 306H having the aspect ratio of 5:3, region306F and region 306G having the aspect ratio of 5:5, region 306J andregion 306M having the aspect ratio of 2:3, and region 306K and 306Lhaving the aspect ratio of 2:5.

A scan path 310A (as shown in a partially dotted path) connects a center304 of the region 306A and a center 308 of the region 306M, passingthrough other centers of regions in the grid 302. A scan path 312A (asshown in a solid path) also connects the same points as the scan path310A but in a diagonal manner. A scan path 314A (as shown in a dottedpath) passes through the same set of the centers but starts from region306J and ends in the region 306M, traversing vertically.

FIG. 3B illustrates an example of a set of scan paths in a grid passingthe corners and the edges of the screen for capturing eye-gaze datapoints in locations near the corners and the edges. In aspects, thedisclosed technology further enhances capturing eye-gaze data inproximities of edges and corners of the screen by designing the scanpaths pass through corners and edges while also passing through centersof respective regions in the grid. In the grid 302, there are twelveregions, where the regions that are adjacent to the edges and thecorners of the grid 302 have smaller areas than the regions toward thecenter of the grid 302. FIG. 3B includes an example of three scan paths:scan path 310B (shown in a solid path), scan path 312B (shown in a finedotted path), and scan path 314B (shown in a medium dotted path).

In aspects, the three scan paths respectively start at the upper leftcorner 320 of the region 306A and ends at the lower right corner 322 ofthe region 306M. Each scan path passes through centers of the twelveregions in a sequence that is distinct from other scan paths.Additionally or alternatively, respective scan paths pass in proximityof edges and corners of the screen.

FIG. 4 illustrates an example of a scan path across multiple displaysaccording to aspects of the present disclosure. The example 400illustrates a screen based on a combination of four displays in atwo-by-two matrix form: an upper left display 402, an upper rightdisplay 404, a lower left display 406, and a lower right display 408.The four displays form as one screen for the operator of the computingdevice (and/or computing devices) to interact using eye-gaze.

A scan path 422 connects a center 418 of an upper left region of theupper left display 402 and a center 420 of a lower right region of thelower right display 408. The scan path 422 passes through regions inacross respective displays in a diagonal manner, without crossing ownscan path, passing centers of respective regions once. Traditionalsystems typically calibrate eye-gaze tracking among displaysindependently. The disclosed technology generates a scan path thatpasses across the multiple displays as one screen for capturing eye-gazedata for training and calibration.

In aspects, the multiple displays may include multiple cameras forcapturing facial images. The disclosed technology captures the facialimages using the multiple cameras. Training an eye-gaze prediction modelusing facial images from multiple cameras as training data enhancesaccuracy in predicting eye-gaze locations because of more feature datafor eye-gazing based on the multiple facial images than one facialimage. In aspects, four facial images from a camera 410 that is attachedto the upper left display 402, a camera 412 attached to the upper rightdisplay 404, a camera 414 attached to the lower left display 406, and acamera 416 attached to the lower right display 408. In aspects, adisplay may include none, one, or multiple cameras. Locations of thecameras are not limited to the top-center of the display. For example, acamera may be placed behind the screen, pointed through the screen.

In some aspects, the disclosed technology simultaneously captures allfour video streams of facial images from the four cameras throughout aperiod when the symbol moves across the four displays along the scanpath. In some other aspects, the system may weigh more on a facial imagecaptured by a camera of a display in which the system displays thesymbol than facial images captures by other cameras. Traditional systemmay process eye-gaze data captured by a camera as out-of-screen when thesymbol is off-screen from the display. The disclosed technology usingfacial images from multiple cameras is particularly effective when thecomputing device places multiple displays in adjacent to each other inangles, collectively forming a “curved” screen. In use of the “curved”screen, use of video streams from the four cameras from differentdirections improves accuracy even when angular movement of gaze makes along stride across the displays. Features of some of facial images fromthe multiple facial images may compensate for errors in other facialimages.

FIGS. 5A and 5B illustrate examples of interactive eye-gaze capturingsystem. Enhancing training data for deep learning of eye-gaze trackingincludes improving a quality, a diversity, and variations of eye-gazedata. Variations of training data may include illuminations, distancefrom camera, head rotation, head pose, and the like. The variation maydepend on operators of the computing devices. The disclosed technologymay include an icon-based interactive process that prompts the operatorto perform specific action as the system captures a video steam offacial images. In aspects, the actions include holding the head still,changing a distance of the face from the camera, changing a position ofthe face in the facial image being captured by the camera, rotating theface, changing sides of the face, and the like.

FIG. 5A illustrates an example of a part of a screen according to theaspects of the present disclosure. An example 500A includes a partialscreen 502 showing scan path 504, scan path 506, and scan path 508, in atime lapse for illustrating movement of the eye-gaze target at varioustimes. An eye-gaze target 514A stops after moving along the scan path504.

The exemplar icon 510 instructs the operator to hold the head still forthe upcoming phase of data acquisition. The number 510A indicates aremaining time after which the next phase begins. No facial imagery iscaptured during this time (or if captured marked to be discarded) as theuser spends this time to read the icon instructions and has attentionaway from the eye-gaze target (stimulus). The color of the eye-gazetarget may change (e.g. grey, dull red etc.) to indicate that no data isbeing captured. After the five seconds lapse, the exemplar icon 510 maydisappear from the partial screen 502 and the eye-gaze target 514Aresume moving along the scan path 504, capturing images in this newmode. In aspects, the scan path 504 does not appear on the partialscreen 502; the operator may just see the eye-gaze target 514A moving ata constant speed as the operator follows. In aspects, there may be oneeye-gaze target being displayed on the partial screen at a time forguiding the operator's focus on the eye-gaze target.

Similarly, an eye-gaze target 516A stops on the scan path 508. Anexemplar icon 512 appears. The exemplar icon 512 instructs the operatorto change distance (e.g., closer to and further from the screen (and/orthe camera). The eye-gaze target 516A then resumes its movement alongthe scan path (e.g., an eye-gaze target 516B). The system captures avideo stream of the facial image throughout the operator interactivelyfollows the eye-gaze target and performs actions as directed at times.In aspects, the disclosed technology integrates sampling of facialimages of the operators at different angles and distance into theeye-gaze sampling. The integration may reduce time needed to collectinformation about the operator while keeping the operator's attention onthe screen.

FIG. 5B illustrates an example of icons that interactively instruct theoperator to perform specific actions while capturing eye-gaze dataaccording to the aspects of the present disclosure. An icon 550instructs the operator to hold a natural expression to the camera. Anicon 552 instructs the operator to hold the head still. An icon 554instructs the operator to change distance to the screen and/or thecamera by moving closer or further from the screen and/or the camera. Anicon 556 instructs the operator to change a position of the face (e.g.,moving to the left, to the right, to above and to below). An icon 558instructs the operator to rotate the face (e.g., tilting the face). Anicon 560 instructs the operator to change sides (e.g., looking to theright, and looking to the left).

FIG. 6 is an example of a method for capturing eye-gaze data forgenerating training data for training an eye-gaze prediction model inaccordance with aspects of the present disclosure. A general order ofthe operations for the method 600 is shown in FIG. 6. Generally, themethod 600 begins with start operation 602 and ends with end operation622. The method 600 may include more or fewer steps or may arrange theorder of the steps differently than those shown in FIG. 6. The method600 can be executed as a set of computer-executable instructionsexecuted by a computer system and encoded or stored on a computerreadable medium. Further, the method 600 can be performed by gates orcircuits associated with a processor, an ASIC, an FPGA, a SOC or otherhardware device. Hereinafter, the method 600 shall be explained withreference to the systems, components, devices, modules, software, datastructures, data characteristic representations, signaling diagrams,methods, etc., described in conjunction with FIGS. 1, 2, 3, 4, 5, 7, and8A-B.

Following start operation 602, the method 600 begins with selectoperation 604, which selects a scan path. In aspects, the selectoperation 604 may select the scan path from a set of scan paths storedin a scan path store. In aspects, the select operation 604 may selectdifferent scan path at different times for the operator to increasevariations of eye-gazing data for training.

Display operation 606 displays a symbol at a start point of the scanpath on the screen. In aspects, the symbol may indicate a red dot orsome other shape stand out on the screen for the operator to maintain afocus. In some aspects, the display operation 606 displays the symbolwithout displaying the scan path. This way, the operator has lessdistraction while following the symbol on the screen.

Receive operation 608 receives a facial image of the operator. Inaspects, the receive operation 608 receives a video stream of the facialimage of the operator as the symbol moves on the screen. Respectiveframes of the facial image correspond to respective locations of thesymbol.

Generate operation 610 generates eye-gaze training data. In aspects, theeye-gaze training data include a facial image and a location informationof the symbol. The location information is true data. The training datais for training an eye-gaze prediction model to predict an eye-gazelocation to be the location based on the facial image.

Store operation 612 stores the eye-gaze training data in the eye-gazetraining data database (e.g., the eye-gaze training data database 122 inFIG. 1). In aspects, the eye-gaze training data includes one or morefacial images associated with a correct eye-gaze location. The eye-gazetraining data include diversity of situations and variations of facialimages of the operator to train the eye-gaze prediction model for avariety of usage situations. For example, use of a smart phone and atablet as a computing device involves a variety of usage scenes (e.g.,positions and angles of holding the smart phone, variance in lighting atdifferent times and locations, and facial angles and expressions at timeof use).

Move operation 614 moves the symbol along the scan path. In aspects, themove operation 614 moves the symbol at a constant velocity across thescreen. The velocity may be slow enough for the operator following themoving symbol easily. Furthermore, the velocity may be fast enough tominimize the time to maintain the operator's attention to the movingsymbol.

The decision operation 616 decides whether the symbol is at the end ofthe scan path. When the symbol is not at the end of the scan path, themethod 600 proceeds to the receive operation 608 and repeats the stepsof capturing new eye-gaze data and storing a new training data based onthe new eye-gaze data. When the symbol is at the end of the scan path,the method 500 may proceed to a train operation 618. Additionally oralternatively, the method 600 may receive a facial image thatcorresponds to the symbol at the end of the scan path, generate, andstore new training data.

The train operation 618 trains an eye-gaze prediction model based on thetraining data through deep learning. In aspects, the train operation 618generates parameters for a set of a convolutional neural network andfully connected neural networks. In some aspects, the train operation618 trains the model for optimizing the model for a particular operatoras a target user of a computing device. For example, smart phones aretypically used by an operator and rarely shared with other users.

Transmit operation 620 transmit the trained eye-gaze prediction modelfor deployment. In aspects, the transmit operation 620 includes updatingthe set of a convolutional neural network and fully connected neuralnetworks with parameters in the trained eye-gaze prediction model. Inaspects, method 600 may end with end operation 622.

As should be appreciated, operations 602-622 are described for purposesof illustrating the present methods and systems and are not intended tolimit the disclosure to a particular sequence of steps, e.g., steps maybe performed in different order, additional steps may be performed, anddisclosed steps may be excluded without departing from the presentdisclosure.

FIG. 7 is a block diagram illustrating physical components (e.g.,hardware) of a computing device 700 with which aspects of the disclosuremay be practiced. The computing device components described below may besuitable for the computing devices described above. In a basicconfiguration, the computing device 700 may include at least oneprocessing unit 702 and a system memory 704. Depending on theconfiguration and type of computing device, the system memory 704 maycomprise, but is not limited to, volatile storage (e.g., random accessmemory), non-volatile storage (e.g., read-only memory), flash memory, orany combination of such memories. The system memory 704 may include anoperating system 705 and one or more program tools 706 suitable forperforming the various aspects disclosed herein such. The operatingsystem 705, for example, may be suitable for controlling the operationof the computing device 700. Furthermore, aspects of the disclosure maybe practiced in conjunction with a graphics library, other operatingsystems, or any other application program and is not limited to anyparticular application or system. This basic configuration isillustrated in FIG. 7 by those components within a dashed line 708. Thecomputing device 700 may have additional features or functionality. Forexample, the computing device 700 may also include additional datastorage devices (removable and/or non-removable) such as, for example,magnetic disks, optical disks, or tape. Such additional storage isillustrated in FIG. 7 by a removable storage device 709 and anon-removable storage device 710.

As stated above, a number of program tools and data files may be storedin the system memory 704. While executing on the at least one processingunit 702, the program tools 706 (e.g., an application 720) may performprocesses including, but not limited to, the aspects, as describedherein. The application 720 includes a scan path selector 722, aneye-gaze target generator 724, a facial image receiver 726, an eye-gazetraining data generator 728, and an eye-gaze prediction model trainer730, as described in more detail with regard to FIG. 1. Other programtools that may be used in accordance with aspects of the presentdisclosure may include electronic mail and contacts applications, wordprocessing applications, spreadsheet applications, databaseapplications, slide presentation applications, drawing or computer-aidedapplication programs, etc.

Furthermore, aspects of the disclosure may be practiced in an electricalcircuit comprising discrete electronic elements, packaged or integratedelectronic chips containing logic gates, a circuit utilizing amicroprocessor, or on a single chip containing electronic elements ormicroprocessors. For example, aspects of the disclosure may be practicedvia a system-on-a-chip (SOC) where each or many of the componentsillustrated in FIG. 7 may be integrated onto a single integratedcircuit. Such an SOC device may include one or more processing units,graphics units, communications units, system virtualization units andvarious application functionality all of which are integrated (or“burned”) onto the chip substrate as a single integrated circuit. Whenoperating via an SOC, the functionality, described herein, with respectto the capability of client to switch protocols may be operated viaapplication-specific logic integrated with other components of thecomputing device 700 on the single integrated circuit (chip). Aspects ofthe disclosure may also be practiced using other technologies capable ofperforming logical operations such as, for example, AND, OR, and NOT,including but not limited to mechanical, optical, fluidic, and quantumtechnologies. In addition, aspects of the disclosure may be practicedwithin a general purpose computer or in any other circuits or systems.

The computing device 700 may also have one or more input device(s) 712,such as a keyboard, a mouse, a pen, a sound or voice input device, atouch or swipe input device, etc. The output device(s) 714 such as adisplay, speakers, a printer, etc. may also be included. Theaforementioned devices are examples and others may be used. Thecomputing device 700 may include one or more communication connections716 allowing communications with other computing devices 750. Examplesof suitable communication connections 716 include, but are not limitedto, radio frequency (RF) transmitter, receiver, and/or transceivercircuitry; universal serial bus (USB), parallel, and/or serial ports.

The term computer readable media as used herein may include computerstorage media. Computer storage media may include volatile andnonvolatile, removable and non-removable media implemented in any methodor technology for storage of information, such as computer readableinstructions, data structures, or program tools. The system memory 704,the removable storage device 709, and the non-removable storage device710 are all computer storage media examples (e.g., memory storage).Computer storage media may include RAM, ROM, electrically erasableread-only memory (EEPROM), flash memory or other memory technology,CD-ROM, digital versatile disks (DVD) or other optical storage, magneticcassettes, magnetic tape, magnetic disk storage or other magneticstorage devices, or any other article of manufacture which can be usedto store information and which can be accessed by the computing device700. Any such computer storage media may be part of the computing device700. Computer storage media does not include a carrier wave or otherpropagated or modulated data signal.

Communication media may be embodied by computer readable instructions,data structures, program tools, or other data in a modulated datasignal, such as a carrier wave or other transport mechanism, andincludes any information delivery media. The term “modulated datasignal” may describe a signal that has one or more characteristics setor changed in such a manner as to encode information in the signal. Byway of example, and not limitation, communication media may includewired media such as a wired network or direct-wired connection, andwireless media such as acoustic, radio frequency (RF), infrared, andother wireless media.

FIGS. 8A and 8B illustrate a computing device or mobile computing device800, for example, a mobile telephone, a smart phone, wearable computer(such as a smart watch), a tablet computer, a laptop computer, and thelike, with which aspects of the disclosure may be practiced. In someaspects, the client utilized by a user (e.g., an operator of the system100 in FIG. 1) may be a mobile computing device. With reference to FIG.8A, one aspect of a mobile computing device 800 for implementing theaspects is illustrated. In a basic configuration, the mobile computingdevice 800 is a handheld computer having both input elements and outputelements. The mobile computing device 800 typically includes a display805 and one or more input buttons 810 that allow the user to enterinformation into the mobile computing device 800. The display 805 of themobile computing device 800 may also function as an input device (e.g.,a touch screen display). If included as an optional input element, aside input element 815 allows further user input. The side input element815 may be a rotary switch, a button, or any other type of manual inputelement. In alternative aspects, mobile computing device 800 mayincorporate more or less input elements. For example, the display 805may not be a touch screen in some aspects. In yet another alternativeaspect, the mobile computing device 800 is a portable phone system, suchas a cellular phone. The mobile computing device 800 may also include anoptional keypad 835. Optional keypad 835 may be a physical keypad or a“soft” keypad generated on the touch screen display. In various aspects,the output elements include the display 805 for showing a graphical userinterface (GUI), a visual indicator 820 (e.g., a light emitting diode),and/or an audio transducer 825 (e.g., a speaker). In some aspects, themobile computing device 800 incorporates a vibration transducer forproviding the user with tactile feedback. In yet another aspect, themobile computing device 800 incorporates input and/or output ports, suchas an audio input (e.g., a microphone jack), an audio output (e.g., aheadphone jack), and a video output (e.g., a HDMI port) for sendingsignals to or receiving signals from an external device.

FIG. 8B is a block diagram illustrating the architecture of one aspectof computing device, a server (e.g., the eye-gaze data collector 110 inFIG. 1), a mobile computing device, etc. That is, the mobile computingdevice 800 can incorporate a system 802 (e.g., a system architecture) toimplement some aspects. The system 802 can implemented as a “smartphone” capable of running one or more applications (e.g., browser,e-mail, calendaring, contact managers, messaging clients, games, andmedia clients/players). In some aspects, the system 802 is integrated asa computing device, such as an integrated digital assistant (PDA) andwireless phone.

One or more application programs 866 may be loaded into the memory 862and run on or in association with the operating system 864. Examples ofthe application programs include phone dialer programs, e-mail programs,information management (PIM) programs, word processing programs,spreadsheet programs, Internet browser programs, messaging programs, andso forth. The system 802 also includes a non-volatile storage area 868within the memory 862. The non-volatile storage area 868 may be used tostore persistent information that should not be lost if the system 802is powered down. The application programs 866 may use and storeinformation in the non-volatile storage area 868, such as e-mail orother messages used by an e-mail application, and the like. Asynchronization application (not shown) also resides on the system 802and is programmed to interact with a corresponding synchronizationapplication resident on a host computer to keep the information storedin the non-volatile storage area 868 synchronized with correspondinginformation stored at the host computer. As should be appreciated, otherapplications may be loaded into the memory 862 and run on the mobilecomputing device 800 described herein.

The system 802 has a power supply 870, which may be implemented as oneor more batteries. The power supply 870 might further include anexternal power source, such as an AC adapter or a powered docking cradlethat supplements or recharges the batteries.

The system 802 may also include a radio interface layer 872 thatperforms the function of transmitting and receiving radio frequencycommunications. The radio interface layer 872 facilitates wirelessconnectivity between the system 802 and the “outside world,” via acommunications carrier or service provider. Transmissions to and fromthe radio interface layer 872 are conducted under control of theoperating system 864. In other words, communications received by theradio interface layer 872 may be disseminated to the applicationprograms 866 via the operating system 864, and vice versa.

The visual indicator 820 (e.g., LED) may be used to provide visualnotifications, and/or an audio interface 874 may be used for producingaudible notifications via the audio transducer 825. In the illustratedconfiguration, the visual indicator 820 is a light emitting diode (LED)and the audio transducer 825 is a speaker. These devices may be directlycoupled to the power supply 870 so that when activated, they remain onfor a duration dictated by the notification mechanism even though theprocessor 860 and other components might shut down for conservingbattery power. The LED may be programmed to remain on indefinitely untilthe user takes action to indicate the powered-on status of the device.The audio interface 874 is used to provide audible signals to andreceive audible signals from the user. For example, in addition to beingcoupled to the audio transducer 825, the audio interface 874 may also becoupled to a microphone to receive audible input, such as to facilitatea telephone conversation. In accordance with aspects of the presentdisclosure, the microphone may also serve as an audio sensor tofacilitate control of notifications, as will be described below. Thesystem 802 may further include a video interface 876 that enables anoperation of an on-board camera 830 to record still images, videostream, and the like.

A mobile computing device 800 implementing the system 802 may haveadditional features or functionality. For example, the mobile computingdevice 800 may also include additional data storage devices (removableand/or non-removable) such as, magnetic disks, optical disks, or tape.Such additional storage is illustrated in FIG. 8B by the non-volatilestorage area 868.

Data/information generated or captured by the mobile computing device800 and stored via the system 802 may be stored locally on the mobilecomputing device 800, as described above, or the data may be stored onany number of storage media that may be accessed by the device via theradio interface layer 872 or via a wired connection between the mobilecomputing device 800 and a separate computing device associated with themobile computing device 800, for example, a server computer in adistributed computing network, such as the Internet. As should beappreciated such data/information may be accessed via the mobilecomputing device 800 via the radio interface layer 872 or via adistributed computing network. Similarly, such data/information may bereadily transferred between computing devices for storage and useaccording to well-known data/information transfer and storage means,including electronic mail and collaborative data/information sharingsystems.

The description and illustration of one or more aspects provided in thisapplication are not intended to limit or restrict the scope of thedisclosure as claimed in any way. The aspects, examples, and detailsprovided in this application are considered sufficient to conveypossession and enable others to make and use the best mode of claimeddisclosure. The claimed disclosure should not be construed as beinglimited to any aspect, for example, or detail provided in thisapplication. Regardless of whether shown and described in combination orseparately, the various features (both structural and methodological)are intended to be selectively included or omitted to produce anembodiment with a particular set of features. Having been provided withthe description and illustration of the present application, one skilledin the art may envision variations, modifications, and alternate aspectsfalling within the spirit of the broader aspects of the generalinventive concept embodied in this application that do not depart fromthe broader scope of the claimed disclosure.

The present disclosure relates to systems and methods for collectingeye-gaze data as training data for a gaze prediction model according toat least the examples provided in the sections below. The methodcomprises selecting a scan path from a set of predetermined scan paths,wherein each scan path is non-self-overlapping on a screen of a device,and wherein the scan path traverses across a series of regions in a gridon the screen; displaying a symbol as an eye-gaze target on the screen,wherein the symbol moves along the scan path for guiding attention ofthe operator; receiving a combination of eye-gaze point data and inputimages associated with a plurality of points along the scan path astraining data for the eye-gaze prediction model, wherein the eye-gazepoint data indicate a uniform distribution over the series of regions inthe grid on the screen; training the eye-gaze prediction model using thetraining data, wherein the eye-gaze prediction model includes dataassociated with parameters in one or more neural networks; and updatingthe parameters in the one or more neural networks using the trainedeye-gaze prediction model. A horizontal-vertical dimension ratio of eachof the series of regions in the grid and a horizontal-verticaldimensional ratio of the screen are identical. Areas of one or moreregions in the series of regions are identical. Areas of one or moreregions in the series of regions adjacent to at least an edge of thescreen are smaller than other regions in the series of the regions. Thescan path passes through a point in a region of the series of regions,wherein the point represents an expected value of uniformly distributedrandom eye-gaze points in the region. The screen includes a plurality ofscreens, and wherein the scan path traverses across a series of regionsin a grid on the plurality of screens. The method further comprisingdisplaying the symbol without movement, wherein the symbol is on a scanpath; selecting an icon from a set of icons, wherein the set of iconsinclude an unrestricted movement of a face, a restricted movement of theface, and one or more actions, wherein the one or more actions includechange distance, change position, change rotation, and change sides of aface of the operator; displaying the icon at a location adjacent to thesymbol; and interactively receiving, in response to the displaying theicon, one or more input images of the operator.

Another aspect of the technology relates to a system for collectingeye-gaze data as training data for an eye-gaze prediction model. Thesystem comprises a processor; and a memory storing computer-executableinstructions that when executed by the processor cause the system to:select a scan path from a set of predetermined scan paths, wherein eachscan path is non-self-overlapping on a screen of a device, and whereinthe scan path traverses across a series of regions in a grid on thescreen; display a symbol as an eye-gaze target on the screen, whereinthe symbol moves along the scan path for guiding attention of theoperator; receive a combination of eye-gaze data and input imagesassociated with a plurality of points along the scan path as trainingdata for the eye-gaze prediction model, wherein the eye-gaze point dataindicate a uniform distribution over the series of regions in the gridon the screen; train the eye-gaze prediction model using the trainingdata, wherein the eye-gaze prediction model includes data associatedwith parameters in one or more neural networks; and update theparameters in the one or more neural networks using the trained eye-gazeprediction model. A horizontal-vertical dimension ratio of each of theseries of regions in the grid and a horizontal-vertical dimensionalratio of the screen are identical. Areas of one or more regions in theseries of regions are identical. Areas of one or more regions in theseries of regions adjacent to at least an edge of the screen are smallerthan other regions in the series of the regions. The scan path passesthrough a point in a region of the series of regions, wherein the pointrepresents an expected value of uniformly distributed random eye-gazepoints in the region. The screen includes a plurality of screens, andwherein the scan path traverses across a series of regions in a grid onthe plurality of screens. The computer-executable instructions that whenexecuted by the processor further cause the system to: display thesymbol without movement, wherein the symbol is on a scan path; select anicon from a set of icons, wherein the set of icons include anunrestricted movement of a face, a restricted movement of the face, andone or more actions, wherein the one or more actions include changedistance, change position, change rotation, and change sides of a faceof the operator; display the icon at a location adjacent to the symbol;and interactively receive, in response to the displaying the icon, oneor more input images of the operator.

In still further aspects, the technology relates to a computer-readablerecording medium storing computer-executable instructions. Thecomputer-executable instructions that when executed by a processor causea computer system to: select a scan path from a set of predeterminedscan paths, wherein each scan path is non-self-overlapping on a screenof a device, and wherein the scan path traverses across a series ofregions in a grid on the screen; display a symbol as an eye-gaze targeton the screen, wherein the eye-gaze target moves along the scan path forguiding attention of the operator; receive a combination of eye-gazedata and input images associated with a plurality of points along thescan path as training data for the eye-gaze prediction model, whereinthe eye-gaze point data indicate a uniform distribution over the seriesof regions in the grid on the screen; train the eye-gaze predictionmodel using the training data, wherein the eye-gaze prediction modelincludes data associated with parameters in one or more neural networks;and update the parameters in the one or more neural networks using thetrained eye-gaze prediction model. A horizontal-vertical dimension ratioof each of the series of regions in the grid and a horizontal-verticaldimensional ratio of the screen are identical. Areas of one or moreregions in the series of regions adjacent to at least an edge of thescreen are smaller than other regions in the series of the regions. Thescan path passes through a point in a region of the series of regions,wherein the point represents an expected value of uniformly distributedrandom eye-gaze points in the region. The screen includes a plurality ofscreens, and wherein the scan path traverses across a series of regionsin a grid on the plurality of screens. The computer-executableinstructions that when executed by the processor further cause thesystem to: display the symbol without movement, wherein the symbol is ona scan path; select an icon from a set of icons, wherein the set oficons include an unrestricted movement of a face, a restricted movementof the face, and one or more actions, wherein the one or more actionsinclude change distance, change position, change rotation, and changesides of a face of the operator; display the icon at a location adjacentto the symbol; and interactively receive, in response to the displayingthe icon, one or more input images of the operator.

Any of the one or more above aspects in combination with any other ofthe one or more aspect. Any of the one or more aspects as describedherein.

What is claimed is:
 1. A computer-implemented method for collectingeye-gaze data as training data for an eye-gaze prediction model, themethod comprising: selecting a scan path from a set of predeterminedscan paths, wherein each scan path is non-self-overlapping on a screenof a device, and wherein the scan path traverses across a series ofregions in a grid on the screen; displaying a symbol as an eye-gazetarget on the screen, wherein the eye-gaze target moves along the scanpath for guiding attention of the operator; receiving a combination ofeye-gaze point data and input images associated with a plurality ofpoints along the scan path as training data for the eye-gaze predictionmodel, wherein the eye-gaze point data indicate a uniform distributionover the series of regions in the grid on the screen; training theeye-gaze prediction model using the training data, wherein the eye-gazeprediction model includes data associated with parameters in one or moreneural networks; and updating the parameters in the one or more neuralnetworks using the trained eye-gaze prediction model.
 2. Thecomputer-implemented method of claim 1, wherein a horizontal-verticaldimension ratio of each of the series of regions in the grid and ahorizontal-vertical dimensional ratio of the screen are identical. 3.The computer-implemented method of claim 1, wherein areas of one or moreregions in the series of regions are identical.
 4. Thecomputer-implemented method of claim 1, wherein areas of one or moreregions in the series of regions adjacent to at least an edge of thescreen are smaller than other regions in the series of the regions. 5.The computer-implemented method of claim 1, wherein the scan path passesthrough a point in a region of the series of regions, wherein the pointrepresents an expected value of uniformly distributed random eye-gazepoints in the region.
 6. The computer-implemented method of claim 1,wherein the screen includes a plurality of screens, and wherein the scanpath traverses across a series of regions in a grid on the plurality ofscreens.
 7. The computer-implemented method of claim 1, the methodfurther comprising: displaying the symbol without movement, wherein thesymbol is on a scan path; selecting an icon from a set of icons, whereinthe set of icons include an unrestricted movement of a face, arestricted movement of the face, and one or more actions, wherein theone or more actions include change distance, change position, changerotation, and change sides of a face of the operator; displaying theicon at a location adjacent to the symbol; and interactively receiving,in response to the displaying the icon, one or more input images of theoperator.
 8. A system for collecting eye-gaze data as training data foran eye-gaze prediction model, the system comprising: a processor; and amemory storing computer-executable instructions that when executed bythe processor cause the system to: select a scan path from a set ofpredetermined scan paths, wherein each scan path is non-self-overlappingon a screen of a device, and wherein the scan path traverses across aseries of regions in a grid on the screen; display a symbol as aneye-gaze target on the screen, wherein the eye-gaze target moves alongthe scan path for guiding attention of the operator; receive acombination of eye-gaze data and input images associated with aplurality of points along the scan path as training data for theeye-gaze prediction model, wherein the eye-gaze point data indicate auniform distribution over the series of regions in the grid on thescreen; train the eye-gaze prediction model using the training data,wherein the eye-gaze prediction model includes data associated withparameters in one or more neural networks; and update the parameters inthe one or more neural networks using the trained eye-gaze predictionmodel.
 9. The system according to claim 8, wherein a horizontal-verticaldimension ratio of each of the series of regions in the grid and ahorizontal-vertical dimensional ratio of the screen are identical. 10.The system according to claim 8, wherein areas of one or more regions inthe series of regions are identical.
 11. The system according to claim8, wherein areas of one or more regions in the series of regionsadjacent to at least an edge of the screen are smaller than otherregions in the series of the regions.
 12. The system according to claim8, wherein the scan path passes through a point in a region of theseries of regions, wherein the point represents an expected value ofuniformly distributed random eye-gaze points in the region.
 13. Thesystem according to claim 8, wherein the screen includes a plurality ofscreens, and wherein the scan path traverses across a series of regionsin a grid on the plurality of screens.
 14. The system according to claim8, the computer-executable instructions that when executed by theprocessor further cause the system to: display the symbol withoutmovement, wherein the symbol is on a scan path; select an icon from aset of icons, wherein the set of icons include an unrestricted movementof a face, a restricted movement of the face, and one or more actions,wherein the one or more actions include change distance, changeposition, change rotation, and change sides of a face of the operator;display the icon at a location adjacent to the symbol; and interactivelyreceive, in response to the displaying the icon, one or more inputimages of the operator.
 15. A computer-readable recording medium storingcomputer-executable instructions that when executed by a processor causea computer system to: select a scan path from a set of predeterminedscan paths, wherein each scan path is non-self-overlapping on a screenof a device, and wherein the scan path traverses across a series ofregions in a grid on the screen; display a symbol as an eye-gaze targeton the screen, wherein the eye-gaze target moves along the scan path forguiding attention of the operator; receive a combination of eye-gazedata and input images associated with a plurality of points along thescan path as training data for the eye-gaze prediction model, whereinthe eye-gaze point data indicate a uniform distribution over the seriesof regions in the grid on the screen; train the eye-gaze predictionmodel using the training data, wherein the eye-gaze prediction modelincludes data associated with parameters in one or more neural networks;and update the parameters in the one or more neural networks using thetrained eye-gaze prediction model.
 16. The computer-readablenon-transitory recording medium of claim 15, wherein ahorizontal-vertical dimension ratio of each of the series of regions inthe grid and a horizontal-vertical dimensional ratio of the screen areidentical.
 17. The computer-readable non-transitory recording medium ofclaim 15, wherein areas of one or more regions in the series of regionsadjacent to at least an edge of the screen are smaller than otherregions in the series of the regions.
 18. The computer-readablenon-transitory recording medium of claim 15, wherein the scan pathpasses through a point in a region of the series of regions, wherein thepoint represents an expected value of uniformly distributed randomeye-gaze points in the region.
 19. The computer-readable non-transitoryrecording medium of claim 15, wherein the screen includes a plurality ofscreens, and wherein the scan path traverses across a series of regionsin a grid on the plurality of screens.
 20. The computer-readablenon-transitory recording medium of claim 15, the computer-executableinstructions that when executed by the processor further cause thesystem to: display the symbol without movement, wherein the symbol is ona scan path; select an icon from a set of icons, wherein the set oficons include an unrestricted movement of a face, a restricted movementof the face, and one or more actions, wherein the one or more actionsinclude change distance, change position, change rotation, and changesides of a face of the operator; display the icon at a location adjacentto the symbol; and interactively receive, in response to the displayingthe icon, one or more input images of the operator.